Morphological Analysis of Inflective Languages through Generation
نویسندگان
چکیده
A crucial problem in development of systems for automatic morphological analysis for inflective languages is the treatment of stem alternations. The existing models require development of the corresponding rules that specify what stems can be generated from a given one. Many of such rules (e.g., for Russian about a thousand) do not have any reasonable linguistic interpretation. We suggest a method that avoids the use of such rules by generating and verifying the hypotheses about possible grammatical forms. The methods of such type are known as analysis through generation; they make the system development much simpler than the standard direct approach. A morphological analysis and generation system for Russian developed with our method is freely available for academic use; a Spanish system is being implemented.
منابع مشابه
Approach to Construction of Automatic Morphological Analysis Systems for Inflective Languages with Little Effort
Development of morphological analysis systems for inflective languages is a tedious and laborious task. We suggest an approach for development of such systems that permits to spend less time and effort. It is based on static processing of stem allomorphs and the method of analysis known as “analysis through generation.” These features allow for using the morphological models oriented to generat...
متن کاملMorphemic Analysis: A Dictionary Lookup Instead of Real Analysis
This paper presents an approach for developing morphological and morphemic analysis systems for inflective languages based on a simple and fast dictionary lookup instead of any kind of analysis of the input word form. This approach allows the information about the word forms (lemma, tag, morpheme structure, derived words, derivational relations) to be described according to the traditional gram...
متن کاملOpen-Source Tools for Morphology, Lemmatization, POS Tagging and Named Entity Recognition
We present two recently released opensource taggers: NameTag is a free software for named entity recognition (NER) which achieves state-of-the-art performance on Czech; MorphoDiTa (Morphological Dictionary and Tagger) performs morphological analysis (with lemmatization), morphological generation, tagging and tokenization with state-of-the-art results for Czech and a throughput around 10-200K wo...
متن کاملSemi-Automatic Parallel Corpora Extraction from Comparable News Corpora
The parallel corpus is a necessary resource in many multi/cross lingual natural language processing applications that include Machine Translation and Cross Lingual Information Retreival. Preparation of large scale parallel corpus takes time and also demands the linguistics skill. In the present work, a technique has been developed that extracts parallel corpus between Manipuri, a morphologicall...
متن کاملReduction of Morpho-Syntactic Features in Statistical Machine Translation of Highly Inflective Language
We address the problem of statistical machine translation from highly inflective language to less inflective one. The characteristics of inflective languages are generally not taken into account by the statistical machine translation system. Existing translation systems often treat different inflected word forms of the same lemma as if they were independent of each other, although some interdep...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Procesamiento del Lenguaje Natural
دوره 29 شماره
صفحات -
تاریخ انتشار 2002